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Pattern-Analysis  Based  Models  of  Masking  by  Spatially 
Separated  Sound  Sources 
AFOSR  91-0289 
Annual  Progress  Report 
May  IS,  1992  to  May  14,  1993 

I.  RESEARCH  OBJECTIVES 

The  long-tenn  goal  oi  this  program  of  research  is  to  ^)ecify  the  mechanisms  that  underlie 
the  spatial  hearing  abilities  of  humans.  One  of  our  current  projects  is  concerned  with  s^Mtial 
hearing  perfmmance  in  the  presence  <tf  noise.  Tire  motivation  fmr  this  research  is  two  fdd.  First, 
we  are  answering  a  series  of  basic  science  questions  oonconed  with  the  mechanisms  that  allow  us 
to  “hear  ouf  ’  and  process  one  particular  stimulus,  in  die  presence  of  odrer  interfering  stimulL 
Sectmd,  the  results  relate  to  a  series  of  iqiplied  questions,  concerning  the  effectiveness  of  three- 
dimensional  virtual  auditory  displays,  when  thc^  displays  are  cmrqilex  (e.g.,  containing  many 
stimuli)  or  when  they  are  used  in  a  noisy  environment  (e.g.,  a  cockpit).  A  second  proj  ct  is 
develqiing  a  model  of  spadal  hearing.  A  number  of  potential  monaurd  and  binaural  cues  have 
been  suggested  as  a  potential  basis  fm*  sound  localization.  We  view  die  observer  in  a  sound 
localization  task  as  atieiiqiting  to  associate  die  pattern  0^  acoustic  cues  received  on  a  particular  trial 
with  a  particular  source  direci^.  We  are  using  neural  network  models  to  perftxm  this  pattern 
recognition  task,  and  are  attempting  to  determine  which  cues  are  necessary  to  achieve  human-like 
performance. 

n.  STATUS  OF  THE  RESEARCH 

Much  of  the  work  described  here  is  being  conducted  in  die  Auditory  Localization  Facility  of 
the  Armstrong  Laboratory  at  Wright-Pattoson  Air  Force  Base.  This  facility  contains  a  14-foot 
diameter  geodesic  sphere,  with  277  speakers  mounted  on  its  surface.  This  is  a  unique  facility  that 
allows  the  experimenter  considerable  control  over  the  spatial  distribution  of  sound  sources  when 
conducting  sound  localization  fiee-field  masking  research.  Additional  studies  are  being 
performed  in  die  Signal  Detection  Laboratory  oi  tire  Department  oS  P^cbology  at  Wright  State 
University.  This  is  a  more  traditional  psychoacoustic  facility,  where  subjects  listen  to  sounds 
presented  over  headphones  in  individual  sound-attenuating  booths.  Some  of  the  weak  described 
here  receives  additional  support  from  Armstrong  Laboratory,  from  a  grant  from  die  Natkmal 
Institutes  of  Health,  and  drough  cost-sharing  funds  from  Wright  State  University. 

A.  Detection  in  Noise. 


Frcc-field  Maskiny.  Our  woric  on  fiee-field  masking  replicates  previous  wmk  that  has 
shown  a  substantial  inoease  in  d^ectability  when  the  signal  and  maslrer  are  spatially  separated 
[e.g.,  K.  Saberi,  L.  Dostal,  T.  Sadralodabai,  V.  Bull,  and  D.R.  Perron,  J.  Acoust  Soc.  Am.  2Q, 
1355-1370  (1991)].  However,  in  our  worit  the  stimulus  frequency  was  systematically 
manipulated.  In  Aese  studies,  the  detectability  of  a  brief  click-train  signal  in  the  presence  of  a 
white  Gaussian  noise  masker  was  measured  as  a  function  of  the  spatial  separation  between  the 
signal  and  the  masker.  Bodi  the  signal  and  the  masker  were  band-limited  to  lie  widiin  low-  (below 
1.4  KHz),  mid-  (1.2  to  6.8  KHz),  or  high-  (above  3.5  FHz)  frequency  ranges.  The  masker  was 
located  direedy  in  front  of  the  subject,  directly  to  the  Id^t  of  the  subject,  or  directly  above  die 
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subject. 

When  the  si^al  was  separa^  from  the  masker  in  azimuth  within  the  horizontal  plane,  the 
detectability  of  die  signal  could  be  increased  by  as  much  as  18  dB  (see  Figure  1).  Inoeases  in 
detectability  cf  as  much  as  8  dB  were  observed  far  sqiarations  in  etevation  within  die  median  plane 
(see  Figure  2).  In  all  cases,  the  increases  in  detectability  observed  for  the  high-fr^uency  signal 
were  as  great,  or  greater,  than  those  observed  for  the  low-frequency  signal.  Tradidonal  models  of 
binaural  masking,  based  on  inteiaural  differences,  did  not  [ne^a  die  increases  in  detectability 
observed  widi  vertical  separations  within  the  median  plane,  where  interaimd  differences  are 
relatively  small.  Mmeover,  these  models  seemed  ina^uate  to  eiqilain  the  effects  of  stimulus 
frequent^;  the  increase  in  the  magnitude  of  the  interaural  level  difference  with  increasing  frequency 
was  not  great  enough  to  predict  the  observed  inqxovement  in  pnformance. 

In  a  further  attempt  to  relate  these  results  to  traditional  headphtxie-based  binaural  masking 
results,  continuous  and  gated  maskers  were  cooqMued.  Based  on  the  headphone  results  of 
McFadden  [J.  Acoust  Soc.  Am.  ^  1414-1419  (1966)],  who  found  that  the  “binaural  release 
frcxn  masking”  was  greater  with  a  continuous  masker  than  with  a  gated  masker,  it  might  be 
expected  that  the  effects  of  spatial  separation  would  be  greater  with  a  continuous  masker  dian  with 
a  gated  masker.  On  average,  we  found  that  the  signal  was  about  2-3  dB  more  detectable  in  the 
presence  of  a  cmitinuous  masker.  However,  in  conflict  with  the  headphone  results,  these  effects 
were  not  systematically  related  to  spatial  separation. 

Overall,  the  results  of  these  studies  indicate  that  masking  release  on  the  mder  ttf  8  to  18  dB 
can  be  observed  in  free-fkld  mtusking  situations  when  tiie  signal  and  the  masker  are  spatially 
separated.  The  pattern  of  results  from  these  experiments  en^hadzes  the  importance  of  high- 
fr<^uency  information.  These  results  have  inqiortant  implications  for  display  designers,  indicating 
the  nature  and  magnitude  of  the  changes  in  detectability  that  can  be  expected  when  sounds  are 
spatially  separated.  Portions  of  this  work  were  {aesented  at  the  Boston  University  Binaural 
Conference,  December  1991;  die  fall  meeting  of  the  Acoustical  Society  America  in  1992;  the 
meeting  of  die  Human  Factors  Society,  Octt^ier  1992;  and  the  AFOSR  Review:  Research 
Hearing,  June  1993.  A  proceedings  paper  describuig  some  of  this  work  has  been  published:  Good 
and  Gilkey  (1992).  A  paper  describing  this  work  has  been  submitted:  Gilkey  and  Good  (submitted 
December,  1993). 

Reproducible  Noise  Masking.  We  have  also  been  using  headphone-based  masking 
experiments  to  test  the  predictions  of  traditional  models  of  binaural  interaction.  The  large  masking 
level  difference  (MLD)  observed  between  monaural  and  binaural  tone-in-noise  masking  tasks  has 
been  used  to  suggest  that  quite  different  processing  is  employed  under  these  two  condititms  (e.g., 
energy  detection  vs.  interaural  time  processing).  However,  when  Gilkey,  Robinson,  and  Hanna 
[J.  Acoust  Soc.  Am.  2S,  1207-1219  (1985)]  examined  the  trial-by-trial  re^nses  of  subjects, 
frey  found  that  the  responses  under  the  NOSO  (both  noise  and  signal  presented  diotically)  and 
N(^7C  (noise  diotic,  but  signal  presented  180**  out  of  phase  interaurally)  ctMiditions  were  highly 

correlated.  That  is,  although  the  signal  level  under  the  NOSrt  condition  is  10  to  15  dB  lower  than  - 

under  the  NOSO  ctmdition  (because  of  the  MLD),  individual  reproducible  noise-alone  or  signal-  _ 

plus-noise  waveforms  that  were  likely  to  elicit  a  positive  re^nse  (i.e.,  a  report  of  signal  presoit)  7 

under  one  condition  were  also  likely  to  elicit  a  positive  response  under  the  other  condition.  Gilkey  □ 
et  al.  used  wideband  reproducible  noise  samples  as  maskers.  When  Isabelle  and  Colburn  [J.  □ 

Acoust.  Soc.  Am.  22, 352-359  (1991)]  examined  the  responses  of  subjects  to  narrowband  _ 

reproducible  noise  samples,  they  found  correlations  that  were  much  weaker  and  often  negative. 

They  attributed  the  differences  between  their  data  and  those  of  Gilkey  et  al.  to  the  differences  in  the _ 

bandwidth  of  the  masker.  However,  Gilkey  [Paper  presented  at  the  Midwinter  Meeting  of  the 
Association  for  Research  in  Otolaryngology,  February,  1990]  directly  compared  narrowband  and 
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wideband  results  fw  die  same  subjects  and  found  hi^y  significant  correlations  between  NOSO 
and  NOSic  conditions  with  both  widebtuul  and  narrowband  maskers.  The  crarelatimi  between 
NOSO  and  NOSn  responses  has  significant  imphcadons  for  models  of  both  monaural  and  binaural 
performance.  Lateralization-based  models  binaural  masking  are  unable  to  predict  these  data.  On 
the  other  hand,  Gilkey  et  al.  showed  that  for  a  model  such  as  the  Equalization-Cancellation  (EQ 
model  [N.I.  Duilach,  “Binaural  signal  detection:  Equalization  and  Cancellation  Theory."  in 
Foundations  of  Modem  Auditory  Theoiy  n,  edited  by  J.V.  Tobias  (Academic  Press,  New  York), 
371-462  (1972)]  the  effective  maskers  under  the  NO^  and  NOSn  conditions  are  highly  correlated. 
Thus,  the  observed  correlation  of  re^nses  between  these  conditions  is  not  necessarily 
unexpected. 

In  order  to  evaluate  more  fully  the  piedictkms  of  the  EC  model,  we  have  measured  the 
responses  of  subjects  to  individual  reproducible  waveforms  under  conditions  where  the  effective 
masker  at  the  output  of  the  EC  device  should  be  quite  different  from  the  masker  under  the  NOSO 
condition.  Under  the  NuSic  condition  (independent  noises  to  the  two  ears,  signal  180°  out  of 
phase  interaurally),  the  EC  device  should  subtract  die  stimuli  arriving  from  the  two  ears,  such  that 
the  effective  masker  at  the  output  of  the  EC  device  is  the  difference  between  the  two  monaural 
maskers.  Thus,  the  EC  model  predicts  that  NOSO  responses  to  either  of  the  two  “monaural” 
maskers  should  be  only  partially  correlated  with  the  NuSit  responses.  This  is  exactly  what  we 
observed  in  the  data  of  our  subjects.  However,  when  the  responses  to  the  two  nnonaural 
waveforms  were  averaged  and  then  compared  to  the  NuSn  responses,  a  very  high  correlation  was 
observed.  This  result  was  not  anticipated 

Because  we  were  surprised  by  this  result,  we  next  examined  a  condition  where  die  actual 
masker  under  the  NOSO  condition  was  the  difference  between  the  two  maskers  presented  under  the 
NuSit  condition.  That  is,  the  masker  under  the  NOSO  condition  was  the  predicted  effective  masker^ 
under  the  NuSic  condition.  Radio'  than  the  strong  correlation  we  expect^  between  these  two 
conditions,  based  on  the  EC  model,  only  a  weak  relation  was  observed. 

Overall,  the  pattern  of  results  suggested  two  possibilities,  eitho:  1)  the  NuStc  condition  is 
not  a  true  binaural  condition,  as  suggested  by  Duriach,  Gabriel,  Ctolbum,  and  Trahiotis  [J.  Acoust. 
Soc.  Am.  22>  1548-1557  (1986)],  or  2)  the  EC  model  is  an  inadequate  model  of  binaural  hearing. 
Parts  of  this  work  were  present^  at  the  Boston  University  Binaural  Ckinference,  December,  1991, 
and  at  the  Midwinter  Meeting  of  the  Association  for  Research  in  Otolaryngology,  February,  1993. 

B.  Localization  in  Noise. 

In  many  situations  the  observer  must  be  able  to  determine  the  direction  of  the  sound  source, 
once  it  has  been  detected.  Presumably,  because  of  the  increased  complexity  of  the  localization  task 
relative  to  the  detection  task,  a  mote  complete  representation  of  the  signal  information  would  be 
necessary  for  accurate  localization.  Much  of  our  work  on  sound  localization  is  represented  by  the 
thesis  research  of  Michael  D.  Good.  He  is  investigating  localization  accuracy  in  noise  as  a  function 
of  signal-to-noise  ratio  and  as  a  function  of  the  location  of  the  masker. 

In  designing  these  experiments,  we  realized  that  currently  available  response  techniques 
allow  responses  to  be  collected  at  only  veiy  slow  rates  (2-4  responses  per  minute).  It  was  clear 
that  if  we  were  to  collect  data  at  these  slow  rates,  even  relatively  simple  experiments  would  take 
months  or  years  to  complete.  Therefore,  we  needed  to  develop  a  technique  that  would  allow 
subjects  to  record  accurately  the  perceived  location  of  a  sound,  at  speeds  that  were  much  more 
rapid  than  was  possible  wiA  current  techniques. 
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After  considering  a  number  of  alternatives,  we  decided  that  a  pointing  technique  would  be 
the  most  effective  procedure.  It  was  known  that  subjects  could  accurately  in^cate  the  location  of  a 
sound  by  verbally  repenting  spherical  coordinates  [F.L.  Wightman  and  D.  Kistler,  J.  Acoust  Soc. 
Am.  868-878  (1989)].  Oiu*  own  pilot  stiulies  indicated  that  when  presented  with  spherical 
coordinates,  subjects  could  point  to  the  conesponding  location  on  a  sn^  plastic  hemisphere  tt> 
within  a  few  degrees.  We  therefore  devele^red  a  technique  in  which  the  subjects  indicate  the 
perceived  location  of  a  sound  pointing  at  an  8-in  sph^cal  model  d  auditory  space.  The 
subjects  point  at  die  sphere  using  a  magnetic  stylus,  whose  XYZ  coordinates  are  monitored  widi  a 
Polhemus  Fastrack  “head  tracker”;  diey  then  press  a  foot-switdi  to  record  their  response.  The 
results  indicate  that  subjects  are  able  to  reqxMid  at  rates  of  16- 19  reqxMises  per  minute, 
considerably  faster  than  with  other  techniques.  Further,  the  accuracy  of  di^  responses  is 
comparaUe  to  that  which  )^^tman  and  Kisder  observed  with  the  verbal  rqxming  technique. 
Figures  3  and  4  compare  the  azimuth  and  elevation  judgment  centroids  of  our  subjects  to  the 
judgment  centroids  of  two  of  the  subjects  of  Wightman  and  Kisder.  These  results  were  presented 
at  the  AFOSR  Review:  Research  on  Hearing,  June  1993.  A  paper  describing  this  technique  has 
been  submitted  [Gilkey,  Good,  Ericson,  Brinkman,  and  Stewart,  (submitted)]. 

C.  Neural  Network  Models  of  Sound  Localization. 

At  least  three  sources  of  acoustic  information  are  generaUy  recognized  as  providing  the 
foundation  for  sound  localization:  interaural  time  differences,  interaural  level  differences,  and 
direction-specific  spectral  modulations  introduced  by  die  acoustics  of  the  tmso,  head,  and  pinnae. 
No  model  has  been  developed  to  describe  how  these  disparate  sources  of  information  are  combined 
into  a  single  unified  perception  of  the  source  location. 

If  the  pattern  of  interaural  time  differences,  interaural  level  differences,  and  spectral 
modulations  is  unique  for  each  source  direction,  then  the  task  of  the  observer  in  a  localizatimi 
experiment  can  be  viewed  as  estimating  the  value  of  these  cues  and  determiiung  the  location  that 
corresponds  to  the  estimated  pattern;  that  is,  within  this  view,  sound  localization  is  a  pattern 
recognition  task.  Because  neural  netwoiks  have  had  great  success  in  sdving  other  pattern 
recognition  problems,  we  have  been  using  them  to  model  sound  localization. 

In  our  initial  investigations,  the  model  has  been  conqwsed  of  a  preprocessing  section  and  a 
neural  network  section.  In  the  preprocessing  stage,  die  click  signals  were  convolved  with  head- 
related  transfer  functions  (filters  that  simulate  the  acoustic  effect  of  the  torso,  head,  and  pinnae). 
The  filtered  clicks  were  then  corrupted  by  internal  noise;  each  point  on  the  wavefonn  was 
multiplied  by  a  random  amplitude  jitter  and  subjected  to  a  random  delay,  in  a  manner  similar  to  that 
described  by  Durlach  [J.  Acoust  Soc.  Am.  25, 1206-1218  (1963)].  A  broadband  cross- 
ctarelation  was  computed  between  the  jittered  waveforms  in  the  left  and  tight  ears  and  die  lag 
corresponding  to  the  maximum  in  the  cross-correlation  function  was  one  input  to  the  netwmic 
section  of  the  noodel.  In  addition.  Fast  Fourier  Transforms  were  computed  from  the  waveforms  in 
the  left  and  right  channels  and  the  logarithm  of  the  eno-gy  in  each  of  22  rectangular  quarter-octave 
bands  was  determined.  The  difference  between  the  log  spectra  in  the  left  and  right  ears  provided 
22  additional  inputs  to  the  network  stage.  The  scnind  source  could  originate  fixrni  any  of  144 
directions  ranging  in  azimuth  from  -165®  to  +180®  and  in  elevation  from  -36®  to  +54®.  One  hundred 
training  vectors  were  generated  for  each  of  the  144  source  locations  (a  total  of  14,400  training 
vectors).  A  second  set  of  14,400  vectors  was  used  as  a  test  set.  There  were  23  input  units,  50 
hidden  units,  and  30  output  units.  A  fully  connected  feed-forward  network  was  trained,  with 
back-propagation,  to  “turn  on”  1  of  6  output  units  to  indicate  which  of  6  possible  elevations  had 
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been  presented,  and  1  of  24  output  units  to  indicate  which  of  the  24  possible  azimuths  had  been 
presented. 

Figure  5  shows  a  comparison  of  the  responses  of  a  human  subject  and  of  the  model  in 
comparable  listening  conditions.  The  top  two  panels  show  the  azimuth  component  of  the  judgment 
centroid  plotted  as  a  function  of  the  target  azimuth.  As  can  be  seen,  both  the  network  nxxlel  and 
the  human  subject  made  very  accurate  azimuth  judgments.  The  bottom  two  panels  show  the 
elevation  component  of  the  judgment  centroids  as  a  function  of  the  actual  elevation.  The  overall 
performance  of  the  human  and  the  model  are  similar,  but  the  human  systematically  overestimates 
the  elevation,  while  the  model  underestimates  high  elevations  and  overestimates  low  elevations. 
(This  results,  in  part,  from  the  response  restrictions  placed  on  the  model;  that  is,  it  cannot  respond 
with  elevations  greater  than  54**  or  less  than  36**;  the  human  was  not  similarly  constrained).  Tlie 
average  angle  of  ttror  for  the  human  and  the  model  are  similar  in  magnitude  and  the  number  of 
front/back  reversals  observed  for  the  model  and  the  human  are  also  comparable. 

Wightman  and  Kistler  [J.  Acoust.  Soc.  Am.  £1, 1648-1661  (1992)]  demonstrated  that  for 
human  observers  low-frequency  timing  information  plays  a  dominant  role  in  determining  their 
localization  judgments,  lltat  is,  when  interaural  time  cues  provide  information  about  the  source 
location  that  conflicts  with  information  provided  by  interaui^  level  differences  or  “spectral  cues," 
subjects  tend  to  judge  the  sound  as  coming  from  the  location  indicated  by  the  interaural  time 
difference,  rather  than  from  the  location  indicated  by  these  other  cues.  Following  Wightman  and 
Kistler,  we  took  the  previously  described  network  (trained  on  “normal"  stimuli)  and  tested  it  on 
stimuli  with  phase  spectra  that  had  been  modified  to  correspond  to  the  phase  spectrum  of  a  sound 
coming  from  0®  azimuth  and  0®  elevation,  from  -45°  azimuth  and  0®  elevation,  or  from  90®  azimuth 
and  0®  elevation.  The  pattern  of  errors  observed  for  the  model  was  quite  similar  to  that  observed 
for  Wightman  and  Kistler’ s  human  subjects.  That  is,  in  most  cases,  the  model  responded  with  the 
location  indicated  by  the  phase  spectra,  rather  than  the  location  indicated  by  the  power  spectra. 

In  this  modeling  effort,  we  use  the  neural  network  in  a  role  similar  to  that  of  an  “ideal 
detector”;  thus,  the  implications  of  this  work  are  not  in  terms  of  the  structure  of  the  neural  network 
itself.  Rather  we  are  determining  the  viability  of  various  acoustic  cues  for  mediating  human-like 
performance.  Thus  far,  this  work  indicates  that  for  the  simple  stimuli  employed  here,  there  is 
sufficient  information  in  the  binaural  cues  alone  to  produce  localization  performance  comparable  to 
that  of  humans.  Portions  of  this  work  were  presented  at  the  fall  meeting  of  the  Acoustical  Society 
of  America  in  1992;  the  Boston  University  Binaural  Conference,  December  1992;  and  the  AFOSR 
Review:  Research  on  Hearing,  June  1993. 

D.  Laboratory  Development. 

Armstrong  Laboratory.  In  support  of  our  localization  research,  the  new  pointing  response 
technique,  described  previously,  was  developed.  Experiment  control  software  was  develop^, 
which  incorporates  this  technique,  along  with  the  ability  to  present  sounds  from  random  directions 
selected  from  a  set  of  up  to  239  speakers.  In  addition,  a  second  source  (i.e.,  a  masker)  can  be 
presented  from  any  of  5  fixed  locations. 

Wright  State  University.  Two  DECsystem  3000  Model  4(X)  workstations  were  purchased 
to  provide  data  analysis,  graphics,  and  modeling  capabilities  for  the  laborat(»y. 

E.  Conference  on  Binaural  and  Spatial  Hearing 

Considerable  effort  during  this  grant  period  was  expended  on  planning  the  C^onference  on 
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Binaural  and  Spadal  Hearing,  which  was  held  at  the  Hope  Hotel  and  Conference  Center  at  Wright- 
Patterson  Air  Force  Base  in  Ohio,  on  September  9-12, 1993.  Speakers  at  the  conference  included 
Timothy  Anderson,  Leslie  Bernstein,  Jens  Blauert,  John  Brugge,  Thomas  Buell,  Mahlom 
Burkha^,  Robert  Butler,  Rachel  CliRon,  Steven  Colburn,  Theodore  Doll,  Richard  Duda, 
Nathaniel  Durlach,  Raymond  Dye,  Mark  Ericson,  Scott  Foster,  Robert  GUkey,  Wesley  Grantham, 
Ervin  Hafter,  William  Hartman,  Janet  Koehnke,  Aimin  Kohlrausch,  Birger  Kollmeier,  Gregcay 
Kramer,  Shigeyuki  Kuwada,  Richard  McKinley,  Donald  Mershon,  John  Middlebrooks,  David 
PeiTott,  Edgar  Shaw,  Kourosh  Saberi,  Barbara  Shinn-Cunningham,  Richard  Stem,  Elizabeth 
Wenzel,  Frederic  Wightman,  Tom  Yin,  William  Yost,  and  Eric  Young. 
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Good,  M.D.,  &  Gilkey,  R.H.  (1992).  Masking  between  spatially  separated  sounds.  Proceedings 
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Papers  in  preparation 

Gilkey,  R.H.,  &  Good,  M.D.  Effects  of  frequency  on  free-field  masking.  Submitted  to  Human 
Factors,  December,  1993. 

Gilkey,  R.H.,  Good,  M.D.,  Ericson,  M.A.,  Brinkman,  J.,  &  Stewart,  J.M.  A  pointing  technique 
for  rapidly  collecting  localization  responses  in  auditory  research.  Submitted  to  Behavior 
Research  Methods,  Instrumentation,  and  Computers,  December,  1993. 

IV.  PARTICIPATING  PROFESSIONALS 

Robert  H.  Gilkev 

De  Anza  College,  dhipertino,  CA 

University  of  California,  Berkeley,  CA  B.  A.  1976  Psychology 

Indiana  University,  Bloomington,  IN  Ph.D.  1981  Psychology 

Dissertation  title:  “Molecular  psychophysics  and  models  of  auditory  signal  detectability.” 

V.  INTERACTIONS 

Conference  presentations  and  invited  talks 

Gilkey,  R.H.  (1993).  Pattern-analysis  based  models  of  masking  by  spatially  separated  sound 
sources.  AFOSR  Review:  Research  in  Hearing,  Fairborn,  OH,  June. 

Gilkey,  R.H.  (1993).  Comparing  predictions  of  the  Equalization-Cancellation  Model  to  NuSjc 
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R.H.  Gilkcy,  566-86-7642 
Pages 

Gilkcy,  Rii.  (1993).  Auditory  space  perception  and  virtual  environments.  Ohio  Consortium  for 
Virtual  Environment  Research,  Dayton,  OH,  January. 

Gilkey,  R.H.,  Janko,  J.A.,  &  Anderson,  T.R.  (1992).  Using  neural  nets  to  model  sound 
localization.  Boston  University  Binaural  Conference,  Boston,  MA,  December. 

Gilkey,  R.H.,  &  Good,  M.D.  (1992).  Effects  of  frequency  and  masker  duration  on  free-field 
masking.  Journal  of  the  Acoustical  Society  of  America.  92. 2334f  Al. 

Anderson,  T.R.,  Janko,  J.A.,  &  Gilkey,  R.H.  (1992)  An  artificial  neural  network  model  of 
human  sound  localization.  Journal  of  the  Acoustical  Society  of  America.  92, 2298(A). 

McKinley,  R.L.,  Ericson,  M.,  Perrott,  D.,  Brungart,  D.,  Gilkey,  R.,  &  Wightman,  F.  (1992). 
Minimum  audible  angle  fcM*  synthesized  localization  cues  presented  over  headphones. 
Journal  of  the  Acousdcal  Society  of  America.  92>  2297(A). 

Gilkey,  R.H.  (1992).  The  correlation  between  responses  under  monaural  and  binaural 
conditions.  Journal  of  the  Acoustical  Society  of  America.  92. 2298(A). 

Good,  MJD.,  &  Gilkey,  R.H.  (1992)  Masking  between  spatially  separated  sounds.  The  36th 
Annual  Meeting  of  Human  Factws  Society,  Atlanta,  GA,  October. 


R.H.  Gilkey,  566-86-7642 
Page  9 


Hgure  captions 

Figure  1.  Threshedd  signal-to-noise  ratio  is  plotted  as  a  function  of  the  azimutii  of  the  signal  within 
the  horizontal  plane.  Tte  masker  was  presented  from  directly  in  front  the  subject,  (f  azimuth,  (f 
elevation  (left  panel),  or  from  directly  to  the  left  the  subject,  -90^  azimuth  and  QP  elevation,  (right 
panel).  Threshdd  estimates  have  been  averaged  across  the  three  subjects.  Negative  values  along 
the  abscissa  indicate  positions  to  the  left  of  the  subject  and  positive  values  indicate  positiems  to  the 
right  of  die  subject  A  value  of  0°  indicates  a  speaker  location  directly  in  front  of  the  subject  The 
position  ol  the  arrow  shows  die  location  of  the  masker. 

Figure  2.  Thresludd  signal-to-noise  ratio  is  pltMted  as  a  function  of  the  elevation  of  the  signal 
within  the  median  plane.  The  masker  was  presented  from  direedy  in  front  of  the  subject  0^ 
azimuth,  (f  elevation  (left  panel),  or  from  dheedy  above  the  subject  0"  azimuth  and  9(f  elevation 
(right  panel).  Threshold  estimates  have  been  averaged  across  the  diree  subjects.  Negative  values 
along  the  abscissa  indicate  positions  below  the  htxizontal  plane  and  positive  values  indicate 
posititxis  above  the  horizontal  plane.  Elevati<xis  greater  than  90P  indicate  locations  in  the  rear 
hemisphere.  The  position  of  the  arrow  shows  the  location  of  the  masker. 

Rgure  3.  The  azimudi  coordinate  of  the  judgment  centroid  for  each  target  location  is  plotted  as  a 
function  of  the  azimuth  of  the  target  The  top  three  panels  show  data  for  each  the  3  subjects  in 
our  experiment;  they  responded  with  the  pointing  technique.  The  bottom  two  panels  show  data  for 
2  of  the  subjects  of  Wightman  and  Kisder,  they  responded  verbally.  (The  panel  on  the  bottom  left 
shows  data  from  one  of  their  better  subjects  and  the  panel  on  the  bottom  right  shows  data  fixrni  one 
of  their  worst  subjects.)  The  centroids  in  the  top  {xuiels  are  based  on  8  judgments  at  each  speaker 
location.  The  centroids  in  the  bottom  panels  are  based  on  either  6  ot  12  judgments  at  each  qpealrer 
location.  Front-back  reversals  have  bMn  resolved. 

Hgure  4.  The  elevation  coordinate  of  the  judgment  centroid  for  each  target  location  is  plotted  as  a 
function  of  the  target  elevation.  Note  that  the  range  of  values  on  the  axes  has  been  reduced 
substantially  relative  to  Figure  3.  Other  details  are  as  in  Figure  3. 

Figure  5.  Performance  oS  subject  SDO,  frcmi  the  study  of  Wightman  and  Kisder  (1989),  and 
performance  of  a  neural  networic,  receiving  only  binaural  input,  are  compared.  In  the  top  two 
panels,  die  azimuth  coordinate  of  die  judgment  centretid  is  plotted  as  a  function  of  the  tai:^t 
azimuth.  In  the  bottom  two  panels,  the  elevation  comdinate  of  die  judgment  centrrad  is  plotted  as  a 
function  of  the  target  elevation.  Ttie  panels  on  the  left  show  the  results  for  subjea  SDO.  The 
panels  on  the  right  show  the  results  for  the  neural  network. 
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